From error bounds to the complexity of first-order descent methods for convex functions
نویسندگان
چکیده
This paper shows that error bounds can be used as e↵ective tools for deriving complexity results for first-order descent methods in convex minimization. In a first stage, this objective led us to revisit the interplay between error bounds and the KurdykaLojasiewicz (KL) inequality. One can show the equivalence between the two concepts for convex functions having a moderately flat profile near the set of minimizers (as those of functions with Hölderian growth). A counterexample shows that the equivalence is no longer true for extremely flat functions. This fact reveals the relevance of an approach based on KL inequality. In a second stage, we show how KL inequalities can in turn be employed to compute new complexity bounds for a wealth of descent methods for convex problems. Our approach is completely original and makes use of a one-dimensional worst-case proximal sequence in the spirit of the famous majorant method of Kantorovich. Our result applies to a very simple abstract scheme that covers a wide class of descent methods. As a byproduct of our study, we also provide new results for the globalization of KL inequalities in the convex framework. Our main results inaugurate a simple methodology: derive an error bound, compute the desingularizing function whenever possible, identify essential constants in the descent method and finally compute the complexity using the one-dimensional worst case proximal sequence. Our method is illustrated through projection methods for feasibility problems, and through the famous iterative shrinkage thresholding algorithm (ISTA), for which we show that the complexity bound is of the form O(q2k) where the constituents of the bound only depend on error bound constants obtained for an arbitrary least squares objective with `1 regularization.
منابع مشابه
Generalization Error Bounds with Probabilistic Guarantee for SGD in Nonconvex Optimization
The success of deep learning has led to a rising interest in the generalization property of the stochastic gradient descent (SGD) method, and stability is one popular approach to study it. Existing works based on stability have studied nonconvex loss functions, but only considered the generalization error of the SGD in expectation. In this paper, we establish various generalization error bounds...
متن کاملAdaBoost and Forward Stagewise Regression are First-Order Convex Optimization Methods
Boosting methods are highly popular and effective supervised learning methods which combine weak learners into a single accurate model with good statistical performance. In this paper, we analyze two well-known boosting methods, AdaBoost and Incremental Forward Stagewise Regression (FSε), by establishing their precise connections to the Mirror Descent algorithm, which is a first-order method in...
متن کاملLower Bounds for Finding Stationary Points of Non-Convex, Smooth High-Dimensional Functions∗
We establish lower bounds on the complexity of finding -stationary points of smooth, non-convex, high-dimensional functions. For functions with Lipschitz continuous pth derivative, we show that all algorithms—even randomized algorithms observing arbitrarily high-order derivatives—have worst-case iteration count Ω( −(p+1)/p). Our results imply that the O( −2) convergence rate of gradient descent...
متن کاملImproved Iteration Complexity Bounds of Cyclic Block Coordinate Descent for Convex Problems
The iteration complexity of the block-coordinate descent (BCD) type algorithm has been under extensive investigation. It was recently shown that for convex problems the classical cyclic BCGD (block coordinate gradient descent) achieves an O(1/r) complexity (r is the number of passes of all blocks). However, such bounds are at least linearly depend on K (the number of variable blocks), and are a...
متن کاملTight Complexity Bounds for Optimizing Composite Objectives
We provide tight upper and lower bounds on the complexity of minimizing the average of m convex functions using gradient and prox oracles of the component functions. We show a significant gap between the complexity of deterministic vs randomized optimization. For smooth functions, we show that accelerated gradient descent (AGD) and an accelerated variant of SVRG are optimal in the deterministic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Math. Program.
دوره 165 شماره
صفحات -
تاریخ انتشار 2017